s used here to illustrate how the histogram approach worked in
2.1 shows the data structure.
A count matrix after the sequencing reads have been mapped to a reference
the airway data. Each count represents the times by which a sequencing read
mapped to a gene. The gene IDs have been shortened by removing the prefix
0000 and the sample IDs have also been shortened by removing the prefix
For instance, the full ID of gene E003 is ENSG00000000003 and the full ID
S08 is SRR1039508.
S08
S09
S12
S13
S16
S17
S20
S21
723
486
904
445
1170
1097
806
604
0
0
0
0
0
0
0
0
467
523
616
371
582
781
417
509
347
258
364
237
318
447
330
324
96
81
73
66
118
94
102
74
ode below was used to call hist to estimate a density function
rst replicate of this data set.
ist(log(x[which(x[,1]>0),1]),nclass=50)
above code, x was a prepared count matrix for this data set. A
unction was estimated based on only non-zero count values. hence
x[,1]>0) was used. The bin number was 50. Figure 2.3 shows
gram of the logarithm-transformed non-negative count data of
ples (SRR1039508 and SRR1039509).
wo histograms of the logarithm-transformed non-zero sequencing counts of the
RR1039508 and SRR1039509 for the airway data.